3 research outputs found
Gaze Estimation Based on Multi-view Geometric Neural Networks
Gaze and head pose estimation can play essential roles in various applications, such as human attention recognition and behavior analysis. Most of the deep neural network-based gaze estimation techniques use supervised regression techniques where features are extracted from eye images by neural networks and regress 3D gaze vectors. I plan to apply the geometric features of the eyes to determine the gaze vectors of observers relying on the concepts of 3D multiple view geometry. We develop an end to-end CNN framework for gaze estimation using 3D geometric constraints under semi-supervised and unsupervised settings and compare the results. We explore the mathematics behind the concepts of Homography and Structure-from- Motion and extend it to the gaze estimation problem using the eye region landmarks. We demonstrate the necessity of the application of 3D eye region landmarks for implementing the 3D geometry-based algorithms and address the problem when lacking the depth parameters in the gaze estimation datasets. We further explore the use of Convolutional Neural Networks (CNNs) to develop an end-to-end learning-based framework, which takes in sequential eye images to estimate the relative gaze changes of observers. We use a depth network for performing monocular image depth estimation of the eye region landmarks, which are further utilized by the pose network to estimate the relative gaze change using view synthesis constraints of the iris regions. We further explore CNN frameworks to estimate the relative changes in homography matrices between sequential eye images based on the eye region landmarks to estimate the pose of the iris and hence determine the relative change in the gaze of the observer. We compare and analyze the results obtained from mathematical calculations and deep neural network-based methods. We further compare the performance of the proposed CNN scheme with the state-of-the-art regression-based methods for gaze estimation. Future work involves extending the end-to-end pipeline as an unsupervised framework for gaze estimation in the wild
GraffMatch: Global Matching of 3D Lines and Planes for Wide Baseline LiDAR Registration
Using geometric landmarks like lines and planes can increase navigation
accuracy and decrease map storage requirements compared to commonly-used LiDAR
point cloud maps. However, landmark-based registration for applications like
loop closure detection is challenging because a reliable initial guess is not
available. Global landmark matching has been investigated in the literature,
but these methods typically use ad hoc representations of 3D line and plane
landmarks that are not invariant to large viewpoint changes, resulting in
incorrect matches and high registration error. To address this issue, we adopt
the affine Grassmannian manifold to represent 3D lines and planes and prove
that the distance between two landmarks is invariant to rotation and
translation if a shift operation is performed before applying the Grassmannian
metric. This invariance property enables the use of our graph-based data
association framework for identifying landmark matches that can subsequently be
used for registration in the least-squares sense. Evaluated on a challenging
landmark matching and registration task using publicly-available LiDAR
datasets, our approach yields a 1.7x and 3.5x improvement in successful
registrations compared to methods that use viewpoint-dependent centroid and
"closest point" representations, respectively.Comment: accepted to RA-L; 8 pages. arXiv admin note: text overlap with
arXiv:2205.0855
Stereo Visual Odometry with Deep Learning-Based Point and Line Feature Matching using an Attention Graph Neural Network
Robust feature matching forms the backbone for most Visual Simultaneous
Localization and Mapping (vSLAM), visual odometry, 3D reconstruction, and
Structure from Motion (SfM) algorithms. However, recovering feature matches
from texture-poor scenes is a major challenge and still remains an open area of
research. In this paper, we present a Stereo Visual Odometry (StereoVO)
technique based on point and line features which uses a novel feature-matching
mechanism based on an Attention Graph Neural Network that is designed to
perform well even under adverse weather conditions such as fog, haze, rain, and
snow, and dynamic lighting conditions such as nighttime illumination and glare
scenarios. We perform experiments on multiple real and synthetic datasets to
validate the ability of our method to perform StereoVO under low visibility
weather and lighting conditions through robust point and line matches. The
results demonstrate that our method achieves more line feature matches than
state-of-the-art line matching algorithms, which when complemented with point
feature matches perform consistently well in adverse weather and dynamic
lighting conditions